Method and system for dynamically allocating servers to compute-resources using capacity thresholds

ABSTRACT

Servers are allocated for use in one of a plurality of compute-resources or for stand-by storage in a free-pool. Server load metrics are selected (e.g., ping-reply time or CP utilization) for measuring load in the servers. Metrics are measured for the servers allocated to the compute-resources. Several metrics can be measured simultaneously. The metrics for each compute-resource are normalized and averaged. Then, the metrics for each compute-resource are combined using weighting coefficients, producing a global load value, G, for each compute-resource. The G value is recalculated at timed intervals. Upper and lower thresholds are set for each compute-resource, and the G values are compared to the thresholds. If the G value exceeds the upper threshold, then a server in the free-pool is reallocated to the compute-resource; if the G value is less than the lower threshold, then a server is moved from the compute-resource to the free-pool.

BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention relates generally to compute-resources (sets ofservers that are logically and physically isolated from one another forthe purpose of security and dedicated usage) and methods for allocatingservers between compute-resources based on a new capacity threshold.More specifically, the present invention relates to a method for settingcapacity thresholds, monitoring the computation load on eachcompute-resource, and reallocating servers when thresholds are exceeded.

2. Background Description

Compute-resources are commonly used for applications supporting largenumbers of users, and those that are central processor unit (CPU)intensive and highly parallizable. Examples of such compute-resourcesinclude web-applications hosted by Internet service providers (ISPs),and many scientific applications in areas such as Computational FluidDynamics Often in such computing environments, load can vary greatlyover time, and the peak to average load ratios are large (e.g., 10:1 or20:1). When the load on a customer site drops below a threshold level,one of its servers is quiesced (removed from service), “scrubbed” of anyresidual customer data, and assigned to a “free-pool” of servers thatare ready to be assigned. Later, when the load on another customerexceeds some trigger level, a server from the free-pool is primed withthe necessary operating system (OS), applications, and data to acquirethe personality of that customer application. Currently, there are fewsystems that support dynamic allocation of servers. Those that do existdepend on manually derived thresholds and measures of normal behavior todrive changes resource allocation. There are no automated effective andefficient methods for determining when a particular compute-resource isoverloaded or under loaded that is relatively independent of applicationmodifications.

Parallel computing and Server-Farm facilities would benefit greatly froman automatic method for monitoring available capacity on eachcompute-resource, and allocating servers accordingly. Such a systemwould provide more efficient use of servers, allowing groups ofcompute-resources to provide consistent performance with a reducednumber of total servers. Such a system would be particularly applicableto large ISPs, which typically have many compute-resources that eachexperience significant changes in computing load.

SUMMARY OF THE INVENTION

According to the present invention, a method and system dynamicallyallocate servers among a plurality of connected server compute-resourcesand a free-pool of servers. Each server compute-resource comprises aplurality of servers. Each server allocated to a compute-resource ismonitored for one metric. For each monitored metric and for eachcompute-resource, a normalized average metric value P is calculated, andfor each compute-resource, a global load value G is calculated. Thisglobal load value is a linear combination of normalized average metricvalues. For each compute-resource, a lower and an upper threshold forthe global load value are defined. The calculated global load value G iscompared to the lower and the upper thresholds. If a compute-resourcehas a global load value G which is greater than the upper threshold, itis declared overloaded and a server is removed from the free-pool andallocated to the overloaded compute-resource. If the compute-resourcehas a global load value G which is less than the lower threshold, it isdeclared under loaded and a server is removed from it and allocated tothe free-pool. If there is an under loaded compute-resource with aglobal load value G less than the lower threshold and an overloadedcompute-resource with a global load value G greater than the lowerthreshold, then a server is removed from the under loadedcompute-resource and allocated to the overloaded compute-resource.

BRIEF DESCRIPTION OF THE DRAWINGS

The foregoing and other objects, aspects and advantages will be betterunderstood from the following detailed description of a preferredembodiment of the invention with reference to the drawings, in which:

FIG. 1 is a block diagram showing two connected compute-resources and afree-pool according to an exemplary embodiment of the invention;

FIG. 2 is a graph showing a plot of response time versus load for aparticular server or compute-resource;

FIG. 3 is a graph showing a plot of global load values versus time,illustrating the method of the present invention; and

FIG. 4 is a graph showing a plot of response time versus loadillustrating prediction bounds;

FIGS. 5A, 5B and 5C, taken together, are a flow chart illustrating theprocess of the method according to the invention; and

FIG. 6 is a block diagram illustrating a system of heterogeneouscompute-resources and the various system components which implement themethod of allocating servers according to the invention.

DETAILED DESCRIPTION OF A PREFERRED EMBODIMENT OF THE INVENTION

The present invention provides a method for computing the maximum loadon a compute-resource and for allocating resources among a plurality ofcompute-resources in a manner that prevents each compute-resource'smaximum from being exceeded. More specifically, this invention embodiesa method to derive a Maximum-Load Vector for each compute resource andto build allocation threshold equations based on the computed currentand maximum load.

As an illustrative example we will show how these thresholds can be usedto drive server allocations in a hosted environment. Servers, or moregenerically resources, are allocated according to the load on eachcompute-resource. In the example environment, each server is assigned toone compute-resource or to a free-pool. Servers assigned to acompute-resource are used for functions specific to the compute-resource(e.g., each compute-resource can be used for a particular website, usedfor a particular application, or used for a particular division within acompany). Servers assigned to the free-pool are typically idle butavailable for allocation to one of the compute-resources. If acompute-resource becomes overloaded (i.e., if the load on thecompute-resource rises above the RT-Transition Point), then a serverfrom the free-pool is allocated to the overloaded compute-resource. If acompute-resource becomes under loaded (i.e., if the load on thecompute-resource decreases below a pre-established threshold), then aserver from the under loaded compute-resource is allocated to thefree-pool. In this way, computing capacity in each compute-resource isadjusted in response to changing load demands.

In the present invention, the compute-resources are monitored for signsof overloading or under loading. Monitoring is performed by measuringselected load metrics, e.g., ping-reply time or central processor (CP)utilization (the percentage of a resource's capacity which is actuallyused, over some period of time) or other metrics indicative of the load.The metric values are normalized, smoothed, and averaged over eachcompute-resource. The resulting metric values are used to determine if acompute-resource is overloaded or under loaded. Several metrics may beused in combination, with individual weighting factors for each.

Referring now to the drawings, and more particularly to FIG. 1, there isillustrated two compute-resources (generally indicated as Network A andNetwork B and referred to herein as compute-resources A and B) and afree-pool 18 according to the illustrative example. In the specificillustration of FIG. 1, compute-resource A has four servers 20 a-20 d,compute-resource B has three servers 22 a-22 c, and the free-pool 18 hastwo servers 24 a-24 b. The servers 20, 22 and 24 may be logicallyisolated from one another through programmable hardware such as avirtual local area network (LAN). Any of the servers 20, 22 and 24 canbe reallocated to the free-pool. The free-pool can have zero to anynumber of servers. Servers can be reallocated manually or automaticallyusing remotely configurable hardware such as virtual LANs.

Compute-resource A and compute-resource B each support distinctcomputing tasks. For example, compute-resource A and compute-resource Bcan each support different websites or parallel applications like raytracing.

FIG. 2 shows a plot of response time versus compute-resource load. Loadis defined as the percentage of allocated compute resources consumed bythe current set of processes being run on the allocated resources. FIG.2 illustrates that the end-user response time increases very little inresponse to changes in load on the target server or compute-resource(the server or compute-resource the end-user request is being run on),until the server or compute-resource reaches its saturation point(utilization is close to 100%). Once saturation is reached, responsetime (RT) will degrade exponentially in response to changes in load.This phenomenon is partially caused by request dropping. Requestdropping triggers request retransmission which rapidly escalates theoverload condition. Once the response time starts to degradeexponentially, one can expect performance to deteriorate rapidly if loadcontinues to climb. Hence, in the present invention, it is desirable forall the servers and compute-resources to have loads that are less thanthe load at which saturation will occur. In other words, the load shouldbe limited to an amount indicated at 28 a or less, for the idealizedserver or compute-resource represented by FIG. 2, we call this theresponse-time-transition point (RT transition).

It is important to note that different server types will becomesaturated at different levels of load. In other words, the curve canmove to the left or to the right, and its slope may vary. Thus, eachapplication and server-type pair will have its own RT curve.

Of course, the load on a server or compute-resource cannot be measureddirectly. Instead, metrics are used to indirectly measure the load, sothat the load can always be maintained below the RT transition limit 28a In operation, we assume there is a management system that collects therequired monitoring metrics on each server, and makes allocationrequests using the methods described here or some other method. In theexample system, the decision to donate or request a server is madeindependently for each compute-resource. Thus, each compute-resource canbe self-managed (in a trusted environment), centrally managed, ormanaged by a distributed management system. Alternate schemes that usecoordinated allocation decision making can also be used.

In the present invention, the monitoring system measures predeterminedmetrics that are indicative of the load on each server and the load oneach compute-resource. In combination, the metrics can be used toapproximate load, which itself can not be captured by a single metric.Several metrics useful in the present invention include:

-   -   Ping-reply time (HTTP head ping-reply): The time required for a        server to reply to an empty request, i.e., that does not include        any server processing time. The ping-reply time is a reasonable        measure of TCP stack delay and is a very good indicator of load.    -   Central processor (CP utilization): The percentage of time that        a machine's processors are busy. The CP utilization metric is        typically expressed as a percentage value.    -   Mbufs denied: The number of message buffers requests denied.        This metric correspond to the number of dropped packets on the        network.    -   SkBuf: The number of socket buffers actively being used in a        particular server. This metric correlates well with Ping-Reply.        Some of the other metrics known in the art that can be used with        the present invention include request-arrival rate, transfer        control protocol (TCP) drop rate, active-connections, and        request processing time (end-user response time minus the time        spent in the network and in queues).

In the present invention, the metrics are measured for each server.Preferably, several complementary metrics are used simultaneously.Generally, it is preferred for the metric to be somewhat insensitive tochanges in the application or traffic mix. Ping-reply and SKBufs are themost robust in this way.

In the method of the present invention, N metrics may be used; eachmetric is designated n₁, n₂, n₃, etc. Each compute-resource has Sservers, with each server designated as s₁, s₂, s₃, etc. Every servercan be of the same type or different in a compute-resource.

In the present invention, each compute-resource has a maximum value foreach metric on each server type supported. This is illustrated in FIG.2. Specifically, the maximum metric value is the average observed value(over several runs) of the metric when the response time reaches theRT-transition point. For example, if the RT-transition point is 1.7x (xbeing the RT of an unloaded machine), then the maximum metric value willbe the metric value that corresponds with a response time of 1.7x.

In the present invention, the maximum metric value for metric n (n isthe metric index) on a server of type t, in a compute-resource isMn_(t). The response time of each server type will respond uniquely inresponse to changes in the metric value. Therefore, each server type hasa separate maximum metric value Mn_(t) for each metric. Mn_(t) willtypically be determined empirically under a typical load mix for thecompute-resource, by measuring the metric when the load is that recordedat the RT-Transition Point.

In the present invention, it is necessary to define a standard server,indicated herein by “std”. A standard server can be a server of anyserver type found in the target compute-resource. A maximum load vectorfor the compute resource being profiled is defined and is given in termsof standard server maximums:Max LV _(compute) _(—) _(reource)=(M1_(std) , . . . , MN _(std))In performing calculations according to the present invention, allvalues are converted in to standard server units. For example if std hasa maximum CPU utilization of 45% and servers of type t have a maximumCPU utilization of 90%, a CPU utilization of 45% on a server of type tis equivalent to 25%, which is 50% of the maximum on the standardserver. The maximum value for metric n for the standard server is givenby Mn_(std). For any other server, s, the maximum value for metric n isgiven by Mn_(s), and is dependent on the type of the server.

In order to combine metrics from heterogenous servers, a capacity weightfor each unique server type t and compute-resource must be computed. Themetric capacity weight roughly corresponds to how many standard serversthe server type in question is equivalent to for each of the metricsused to measure load. For a given compute-resource, the capacity weightfor the nth metric for servers of type t is${MWn}_{t} = \frac{{Mn}_{t}}{{Mn}_{stdr}}$

In the present invention, the metrics are collected from each server ineach compute-resource periodically. The metrics can be collectedcentrally at predefined timed intervals, or can be maintained locallyand forwarded to the resource allocation component only when theirRT-transition point is reached. Preferably, the metrics are measuredevery 1-10 minutes, although they can be measured more or less often asneeded.

The present measured value is updated every time the metric is measured.The present value Pn_(s) may be noisy and highly variable for reasonsother than persistent changes in the amount of load (e.g., changes inthe request processing path, or temporary spikes in the load).Therefore,the present measured value Pn_(s) should be smoothed according to wellknown data smoothing or filtering techniques (e.g., double exponentialsmoothing).

The present value for each metric is first smoothed and then combinedacross the compute-resource's active server set to create a normalizedaverage compute-resource metric value, P_(n). The normalized, smoothedaverage metric value P_(n) is:$P_{n} = \left( \frac{\sum\limits_{s \in S}{measured\_ valueM}_{n{(s)}}}{\sum\limits_{s \in S}{MWn}_{t{(s)}}} \right)$where m is the number of servers in S. Compute-resource A andcompute-resource B each have normalized and smoothed average metricvalues PAn and PBn for each metric. For example, if compute-resource Aand compute-resource B are monitored using three metrics, (e.g.,ping-reply time (n=1), CP utilization (n=2), and Mbuf requests (n=3)),then compute-resource A will have three metric values (PA₁, PA₂, PA₃),and compute-resource B will have three metric values (PB₁, PB₂, PB₃).

Next, the metric values (e.g., (PA₁, PA₂, PA₃) and (PB₁, PB₂, PB₃)), aredivided by their corresponding maximum metric value. This gives us thepercentage of the maximum metric value each present metric value is.This array is called the Current Percent of Maximum Load Vector (%CurrMLV), and is given by:${\%\quad{CurrMLV}} = \left( {\frac{P_{1}}{M_{1{\_ std}}},\ldots\quad,\frac{P_{n}}{M_{n\_ std}}} \right)$

We can then define a single site Load value that represents theaggregate load on a compute-resource as the sum of the % CurrMLV valuesmultiplied by weighting coefficients (C₁, C₂, C₃) to produce a globalload value G for each compute-resource:

-   -   For compute-resource A: G_(A)=C₁% CurrMLV_(A1)+C₂%        CurrMLV_(A2)+C₃% CurrMLV_(A3)    -   For compute-resource B: G_(B)=C₁% CurrMLV_(B1)+C₂%        CurrMLV_(B2)+C₃% CurrMLV_(B3)        This resultant load value is an approximation of the percent of        the maximum the current load is.

Formally the compute-resource load is given by:

-   Let C_(n) be the metric weight of the nth metric between 0 and 1.    This value determines how much the measured metric contributes to    Load ${\sum\limits_{n = 1}^{N}C_{n}} = 1$    ${Load} = {\sum\limits_{n = 1}^{N}\left( {C_{n}^{*}\quad\%\quad{CurrMLV}_{n}} \right)}$

The weighting coefficients C₁ . . . C_(n) are selected according towhich metrics are most reliable and accurate. If a metric is highlyreliable and accurate for determining load, then its associatedweighting coefficient should be correspondingly large. In other words,the magnitude of a coefficient should be commensurate with the qualityof its associated metric. The weighting coefficients C₁, C₂, C₃ allowseveral unreliable and fallible metrics to be combined to create arelatively reliable measurement of load. Values for the coefficients C₁,C₂, C₃ can be selected by a compute-resource administrator or bysoftware based on the types of load. If various combinations of themetrics are reliable more then one G value can be defined. For exampleif C₁ alone is reliable and C₂ and C₃ in combination are reliable, wecan define G_(A1) as {1, 0, 0} and G_(A2) as {0, 0.5, 0.5}. In thiscase, we will flag a threshold violation if either one of these valuesexceeds the threshold set for the compute-resource.

If only one metric is used, then the weighting coefficients and linearcombination calculation are not necessary. In this case, global loadvalues G_(A) and G_(B) are equal to the normalized average metric valuesP_(A) and P_(B).

-   -   For compute-resource A: G_(A)=P_(A), and    -   for compute-resource B: G_(B)=P_(B),        when a single metric is used.

The global load values G_(A) and G_(B) are used in the present inventionto determine when servers should be reallocated.

In the present invention, upper (as a function of the maximum serverload) and lower (as a function of the upper) global load valuethresholds are set for each compute-resource. In operation, each timethe global load values G_(A) and G_(B) are measured, they are comparedto the thresholds. When G exceeds an upper threshold for a specifiedtime, a compute-resource is considered overloaded and a server from thefree-pool is reallocated to the overloaded compute-resource. Similarly,when G is less than a lower threshold, a compute-resource is consideredunder loaded and a server from the under loaded compute-resource isreallocated to the free-pool.

This process is illustrated in FIG. 3, which shows plots of global loadvalues G_(A) and G_(B) versus time. Compute-resource A has lowerthreshold 31 and upper threshold 33, while compute-resource B has lowerthreshold 30 and upper threshold 32.

At time 1, G_(A) drops below the lower threshold 31. Compute-resource Ais under loaded. Consequently, a server from compute-resource A isreallocated to the free-pool.

At time 2, G_(B) exceeds the upper threshold 32. Compute-resource B isoverloaded. Consequently, a server from the free-pool is reallocated tocompute-resource B.

At time 3, G_(A) exceeds the upper threshold 33. Compute-resource A isoverloaded. Consequently, a server from the free-pool is reallocated tocompute-resource A.

At time 4, G_(B) drops below the lower threshold 30. Compute-resource Bis under loaded. Consequently, a server from compute-resource B isreallocated to the free-pool.

In this way, servers in the compute-resources are reallocated accordingto where they are needed most, and reallocated to the free-pool if theyare not needed. When loads are light, the free-pool maintains a reserveof idle servers that can be reallocated to any compute-resource.

It is important to note that reallocating a server to or from acompute-resource will slowly change the G value of a compute-resource asload is shifted to or from the added or removed server. A newly addedserver's metric values are not added to the G value until it has had achange to take over its portion of the compute-resources total load.

When deciding to add additional capacity, one has to take into accountthe current number of resources. Adding an additional server to a set oftwo is not the same as adding an additional server to a set of onehundred. A load of 90% of the maximum may be fine when you have a serverset of one hundred, but may be too high when it contains only threeservers. This argument also applies to resources of differentcapacities. For example, a CPU utilization that is 90% of the maximumdoes not have the same implications for processors with different clockrates (e.g., 600 and 1500 MHz). To account for these differences inexcess capacity we can provide a threshold range, and then compute ourcurrent threshold based on the current capacity. We may want to have aCPU utilization threshold that is between 70% and 90%. Once we have tenor more servers we will use the 90% threshold. If we have between oneand ten servers, we set the threshold to a value between 70% and 90%.The increment to be added to the threshold is simply set to thethreshold range divided by the number of resources the build up was tooccur over. Giving us:${Threshold\_ Increment} = \frac{{Threshold\_ High} - {Threshold\_ Low}}{{Size\_ Growth}{\_ Interval}}$

The following code snippet shows how the actual threshold values areadjusted during execution. IF (current#Servers < min#Servers +Size_Growth_interval) { current_Adjustment=current#Servers− (min#Servers*Threshold_Adjustment);allocationThresholdValue=minAllocationThreshold Value+current_Adjustment; deallocationThresholdValue=minDeallocationThresholdValue−current_Adjustment; }

Selecting the size and type of server to allocate will depend on anumber of factors, including the length of time the server is expectedto be needed, and how high the load may go. Such predictions of futureload are not covered in this paper, but can be found in the openliterature.

To prevent thrashing (i.e., repeatedly allocating and de-allocatingservers) the server de-allocation process should be disabled for thegiven site for a fixed period of time after a server allocation isperformed. Additionally the de-allocation threshold should be chosencarefully. For example assume that the maximum server load is reachedfor a single server site at 300 requests/sec. After an additional serveris added (of equal capacity), each server will receive approximately 150requests/sec. In this case, the de-allocation process should not betriggered unless there are fewer then 150 requests/sec being routed toeach of the allocated servers. In general no server should bede-allocated unless:${Curr\_ Total}_{{req}/\sec} < {\frac{{Server\_ Max}_{{req}/\sec}*\left( {N - 1} \right)}{N} - \left( {{Server\_ Max}_{{req}/\sec} - {Server\_ Min}_{{req}/\sec}} \right)}$

-   -   Curr_Totalreq/sec: Is the total number of requests per second        currently being received by the site    -   Server_Maxreq/sec: Is the maximum number of requests per second        that the standard server can handle    -   Server_Maxreq/sec: Is the maximum number of requests per second        that the standard server can handle    -   N: Is the normalized number of standard servers currently        allocated, i.e. units of compute capacity.    -   DeAllo_Buff_size: Number of requests below the maximum that        should trigger a server de-allocation.        To ensure that normal fluxuations in request rates do not        trigger resource rebalancing the Curr_Totalreq/sec value should        be smoothed. We were able to eliminate threshing using this        de-allocation function.

Preferably in the invention, the global load values G_(A) and G_(B) aresmoothed so the thresholds 30, 31, 32, and 33 (FIG. 3) are notrepeatedly crossed multiple times when global load values are close tothe thresholds. Smoothing will tend to decrease the frequency of serverreallocations.

Also, to protect against frequent server reallocations, severalconsecutive threshold violations are required before the reallocationprocess is triggered. For example, before reallocation of a server tocompute-resource A, the present system may require two, three, four, ormore consecutive measurements of G_(A) in excess of the upper threshold33. Requiring several consecutive threshold violations will tend toreduce the frequency of server reallocations.

Alternatively, threshold violations for a minimum period of time may berequired before server reallocation. For example, before reallocation ofa server to compute-resource A, the present system may require one,five, or ten minutes of G_(A) in excess of the upper threshold 33.

The upper and lower thresholds for the compute-resources are easilychangeable and programmable. Preferably, the upper and lower thresholdsfor each compute-resource can be adjusted by a compute-resourceadministrator. The compute-resource administrator may wish to adjust theupper and lower thresholds according to compute-resource conditions andtype and amount of load. Preferably in the invention, the upperthresholds are not settable to values that correspond to metric valuesgreater than the maximum metric values Mn_(t). Preferably in theinvention, the maximum metric values Mn_(t) create a maximum setting forthe upper threshold.

It is noted that servers can also be directly transferred betweencompute-resources, without being allocated to the free-pool. The use orlack of use of a free-pool is not a requirement of this thresholdsetting process, as the allocation procedure itself is not a part ofthis embodiment. However, whatever allocation process is used shouldensure that any sensitive data is removed from a server before theserver is allocated to a new compute-resource.

Also, it is noted that a server allocated to the free-pool necessarilydoes not perform functions related to compute-resources A and B. Serversin the free-pool are typically idle. Allocation of a server to thefree-pool might not require any special type of allocation. The serversin the free-pool may be idle machines that are simply not allocated toany particular compute-resource or function.

It is noted that reallocation of a server does not require physicalmovement of the server. Typically, reallocation is performed by loadingthe server with a new image (operating system and application), andassigning it to the new compute-resource.

FIGS. 5A, 5B and 5C, taken together, show a flow diagram illustratingthe method of the present invention. The steps illustrated in the flowdiagram are described below:

Step 100: Metric types that are good representations of load for thegiven compute-resource are selected by the administrator using astandard management interface for each compute-resource.

Step 102: The maximum load point for each unique sever type is found,and the selected metrics are measured.

Step 104: Set one of the server types as the standard-server.

Step 106: calculate the capacity weight for the metrics in terms ofstandard servers.

Step 108: Set the lower and upper global thresholds as allowablepercents of the maximum load.

Step 200: Metrics are measured at regular intervals using a standardmonitoring system.

Step 202: Normalized, smoothed average metric values are calculated.

Step 204: The current percent of the maximum load vector is computed.

Step 206: The global load values G are calculated from the normalizedaverage metric values P and coefficients C₁, C₂, C₃. The coefficientscan be selected by a compute-resource administrator

Step 208: Thresholds are adjusted based on the current number ofallocated servers.

Step 210: G values are compared to the upper and lower thresholds.

Step 212: A check is made to see if allocations are enabled.

Steps 300-324: Servers are reallocated if thresholds are violated.

It is important to note that “double exponential smoothing” or someother kind of data smoothing should always be used to remove temporarymetric peaks and valleys. Smoothing can be performed at one or moresteps in the method. For example, time smoothing can be performed whenmetrics are originally measured (before calculation of P values), on Pvalues after the P values are calculated, and/or on G values after Gvalues are calculated.

Also, in the present invention, more than one server can be moved when athreshold is violated. For example, if a measured G value greatlyexceeds an upper threshold, then more than one server can be moved fromthe free-pool to the overloaded compute-resource. Also, since serversare not necessarily equivalent in the invention, the type of server canbe selected according to the magnitude of the threshold violation. If athreshold is strongly violated, then a relatively powerful server can bemoved to or from the free-pool.

Thresholds for Fault Detection

Normal load fluctuations make the use of a single, fixed problemdetermination threshold inadequate. The optimal response time thresholdfor fault identification will vary as a function of load. In generalterms, when the average request response time does not match thosepredicted by the normal RT curve, there may be a fault in the system.

FIG. 4 shows the response time curve and its confidence bounds. Usingnonlinear regression, we can fit a model to our normal RT/Load data. Wethen compute the simultaneous (based on all observations) predictionbounds for new observations, as illustrated in FIG. 4. The graphcontains all three of these curves; specifically, the fitted function,the lower confidence bounds, and the upper confidence bounds. Theconfidence interval can be set to whatever value is desired; 95% istypical. The response time threshold at any given time should be set tothe point along the upper confidence bounds curve corresponding to themaximum anticipated response time under the current load conditions.Each server type will have to have its own threshold function based onits normal response time curve. One can additionally compute anaggregate CCR wide response time curve and use its upper bound curve toidentify faults that may not be limited to a single resource. A changepoint detection algorithm can also be used to detect deviations from themean or variance.

System Components

FIG. 6 depicts the various system components for implementing the methodaccording to the invention. The dashed lines represent off line flows,and the solid lines represent runtime flows. Metrics collected from thecompute-resource's designated standard server 1 a are used to set thecompute-resource's capacity or maximum load vector 2. For each servertype and each compute-resource that may be assigned to acompute-resource, e.g., 1 a, 1 b and 1 c, a set of capacity weightswhich relate the server's metric values to the standard server arecreated. Based on the current set of allocated servers and the presentmetric values, the current percent of maximum capacity is calculated bythe present load deriver 3. These values are fed at runtime into theoverload evaluator 4. Compute-resources and system configuration dataare used in combination to identify capacity overload by the overloadevaluator 4. This in turn is used by the resource allocator 5 and theproblem identifier 6, in addition to state and configuration data, tomake allocation decisions.

While the invention has been described in terms of a single preferredembodiment, those skilled in the art will recognize that the inventioncan be practiced with modification within the spirit and scope of theappended claims.

1. A load driven method for allocating servers among a plurality ofcompute-resources and a free-pool, wherein each compute-resourcecomprises a plurality of servers, the method comprising the steps of:for each monitored metric on the standard server and for eachcompute-resource, calculating a maximum metric value at a maximum loadpoint as a maximum load vector for a compute-resource; setting lower andupper global thresholds as allowable percents of the maximum load point;for each compute-resource and unique server type and for each monitoredmetric, calculating a capacity weight for the monitored metric;monitoring each server allocated to a compute-resource for at least onemetric; for each monitored metric and for each compute-resource,calculating an average normalized metric value P_(n) in standard serverunits; for each monitored metric and for each compute-resource,calculating a current percent of a corresponding maximum metric value asa current percent of maximum load vector; for each compute-resource,calculating one or more global load values G, wherein each global loadvalue is a linear combination of normalized current percent ofcorresponding maximum metric values; for each compute-resource,dynamically adjusting lower upper thresholds for the global load value;and for each compute-resource, comparing the calculated global loadvalue G to the lower threshold and upper threshold, and performing anallocation of servers to compute-resources based on a comparisonoutcome.
 2. The method of claim 1, wherein following the comparisonoutcome, if a load is not predicted to continue for more than someminimum amount of time, do nothing.
 3. The method of claim 1, whereinfollowing the comparison outcome, if some predetermined amount of timehas not elapsed since a last capacity adjustment, do nothing.
 4. Themethod of claim 1, wherein following the comparison outcome, if serversare available in the free pool and an overloaded compute-resource has aglobal load value G greater than the upper threshold, then removing aserver from the free pool and allocating it to the overloadedcompute-resource.
 5. The method of claim 1, wherein following thecomparison outcome, if servers are not available in the free pool and anoverloaded compute-resource has a global value G greater than the upperthreshold, perform resource-negotiation.
 6. The method of claim 1,wherein following the comparison outcome, if an under loadedcompute-resource has a global load value G less than the lowerthreshold, and the following inequality is satisfied${Curr\_ Total}_{{req}/\sec} < {\frac{{Server\_ Max}_{{req}/\sec}*\left( {N - 1} \right)}{N} - \left( {{Server\_ Max}_{{req}/\sec} - {Server\_ Min}_{{req}/\sec}} \right)}$then removing a server from the under loaded compute-resource andallocating it to the free-pool.
 7. The method of claim 1 wherein themaximum load values contained in the maximum-load-vector correspond tothe values measured on the standard server when load reaches theresponse time transition pointMax LV _(compute) _(—) _(resource=() M1_(stdr) , . . . , MN _(stdr)) 8.The method of claim 1, wherein a capacity weight of an nth metric on agiven compute-resource is calculated according to the equation${MWn}_{t} = \frac{{Mn}_{t}}{{Mn}_{stdr}}$
 9. The method of claim 1,wherein each normalized average metric value P is calculated accordingto the equation$P_{n} = \left( \frac{\sum\limits_{s \in S}{measured\_ valueM}_{n{(s)}}}{\sum\limits_{s \in S}{MWn}_{t{(s)}}} \right)$wherein P_(n) is the present value of metric n on server s in standardserver units, m is the number of servers assigned to the computeresource.
 10. The method of claim 1, wherein the Current Percent ofMaximum Load Vector (% CurrMLV), is calculated according to the equation${\%\quad{CurrMLV}} = \left( {\frac{P_{1}}{M_{1{\_ stdr}}},\ldots\quad,\frac{P_{n}}{M_{n\_ stdr}}} \right)$11. The method of claim 1, wherein one or more global load values G arecomputed for each compute-resource, as a linear combination ofnormalized current percent of the corresponding maximum values accordingto the following equation${Load} = {\sum\limits_{n = 1}^{N}\left( {C_{n}*\%\quad{CurrMLV}_{n}} \right)}$12. The method of claim 1, wherein dynamic upper and lower thresholdsfor the global load value are adjusted using the following equation${Threshold\_ Adjustment} = \frac{{Threshold\_ High} - {Threshold\_ Low}}{{Size\_ Growth}{\_ Interval}}$13. The method of claim 1, wherein a deallocation process is inhibitedunless following inequality is satisfied${Curr\_ Total}_{{req}\text{/}\sec} < {\frac{{Server\_ Max}_{{req}\text{/}\sec}*\left( {N - 1} \right)}{N} - \left( {{Server\_ Max}_{{req}\text{/}\sec} - {Server\_ Min}_{{req}\text{/}\sec}} \right)}$14. A computer readable medium containing code which enables a computerto perform a method for allocating servers among a plurality ofconnected compute-resources and a free-pool, wherein eachcompute-resource comprises a plurality of servers, the method comprisingthe steps of: for each monitored metric on the standard server and foreach compute-resource, calculating a maximum metric value at a maximumload point as a maximum load vector for the compute-resource; monitoringeach server allocated to a compute-resource for at least one metric; foreach monitored metric and for each compute-resource, calculating anaverage normalized metric value P_(n) in standard server units; for eachmonitored metric and for each compute-resource, calculating a currentpercent of a corresponding maximum metric value as a current percent ofmaximum load vector; for each compute-resource, calculating one or moreglobal load values G, wherein each global load value is a linearcombination of normalized current percent of the corresponding maximummetric values; for each compute-resource, defining dynamicallycalculated lower threshold and an upper threshold adjustments for theglobal load value; and for each compute-resource, comparing thecalculated global load value G to the lower threshold and upperthreshold, and performing a server allocation according to a comparisonoutcome.
 15. The computer readable medium of claim 14, wherein themethod, following the comparison outcome, determines if load is notpredicted to continue for more then some minimum amount of time, and ifso, does nothing.
 16. The computer readable medium of claim 14, whereinthe method, following the comparison outcome, determines if somepredetermined amount of time has not elapsed since the last capacityadjustment, and if so, does nothing.
 17. The computer readable medium ofclaim 14, wherein the method, following the comparison outcome,determines if servers are available in the free pool and an overloadedcompute-resource has a global load value G greater than the upperthreshold, and if so, removes a server from the free-pool and allocatingit to the overloaded compute-resource.
 18. The computer readable mediumof claim 14, wherein the method, following the comparison outcome,determines if servers are not available in the free pool and anoverloaded compute-resource has a global load value G greater than theupper threshold, and if so, performs resource-negotiation.
 19. Thecomputer readable medium of claim 14, wherein the method, following thecomparison outcome, determines if an under loaded compute-resource has aglobal load value G less than the lower threshold, and if so, removes aserver from the under loaded compute-resource and allocating it to thefree-pool.
 20. A system for allocating servers among a plurality ofconnected server compute-resources and a free-pool, wherein each servercompute-resource comprises a plurality of servers, the systemcomprising: monitoring means for monitoring each server allocated to acompute-resource for a plurality of metric values; calculating means forcalculating a normalized average metric value P for each monitoredmetric value and for each server compute-resource; combining means forlinearly combining the normalized metric values to create a global loadvalue G for each compute-resource; storage means for storing a definedlower threshold and a defined upper threshold for the linear combinationvalue; comparing means for comparing the global load value to the lowerthreshold and upper threshold; and allocating means for allocatingservers among compute-resources and the free-pool.
 21. The system ofclaim 20, wherein the allocating means responds to the comparing meansin the case where an overloaded compute-resource has a global load valuegreater than the upper threshold by removing a server from the free-pooland allocating it to the overloaded compute-resource.
 22. The system ofclaim 20, wherein the allocating means responds to the comparing meansin the case where an under loaded compute-resource has a global loadvalue less than the lower threshold by removing a server from the underloaded compute-resource and allocating it to the free-pool.
 23. Thesystem of claim 20, wherein the allocating means responds to thecomparing means in the case where an under loaded compute-resource has aglobal load value G less than the lower threshold and an overloadedcompute-resource has a global load value G greater than the upperthreshold by removing a server from the under loaded compute-resourceand allocating it to the overloaded compute-resource.
 24. The system ofclaim 20, further comprising means for calculating a capacity weight ofeach server type for each compute-resource.
 25. The system of claim 24,wherein server capacity weights are klused in combination with currentmetric values to compute a present load as represented by each metrictype.
 26. The system of claim 20, wherein a Current Percent Maximum Loadvector is linearly combined with metric reliability weights to generateone or more global compute-resource weights for each compute-resource.27. The system of claim 20, wherein each compute-resource upper andlower thresholds are dynamically adjusted.